Introduction to probability theory

Douwe Molenaar

Systems Biology Lab

2024-11-01

Statistical relations

Observation about chocolate consumption and Nobel prizes

Discuss this case


Topics

  • (random) variables
  • marginal and joint probability distributions
  • (in)dependence of variables and null hypothesis
  • probability and conditional probability
  • statistical and scientific significance
  • mechanism

Basic concepts in probability theory

  • Experiment: obtaining one or more observations from purely observational to highly controlled (e.g. in the lab)
  • Sample space: the set of all possible outcomes
    • Example die: \(\Omega_{die} = \{1,2,3,4,5,6\}\)
  • Event: doing an Experiment and making an observation or a combination of observations
    • An event is a subset1 of the Sample space
      • Example: observing an odd number of pips \(E_{odd}=\{1,3,5\}\)
    • We can speak of the probability of an Event
      • Example: honest die, odd number of pips, \(P(E_{odd}) = \frac{1}{2}\)
  • Event space: The set of all Events
    • The Event space is a set of sets, specifically a \(\sigma\)-algebra (definition, see syllabus)
    • The Event space includes \(\Omega\), the Certain Event, and \(\varnothing\), the Impossible Event
    • Example: \(\Sigma = \{E_{even}, E_{odd}, \Omega_{die}, \varnothing\} = \{\{1,3,5\}, \{2,4,6\}, \Omega_{die}, \varnothing\}\)

Application



Now discuss Sample space, Event space and probability of an Event for the Chocolate data.

Probabilities

What is a probability?

  • We speak of probabilities of Events.
    • probability of observing an even number of pips on a die \(\left(1/2\right)\).
    • probability of observing an even or odd number of pips on a die \((1)\).
    • probability of observing blood glucose concentration in the range \([4,6]\; \text{mM}\) in the general population at 10 AM.

Interpretation of probability

  • Long term frequency of observing the Event when performing many, many Experiments
  • Quantification of our belief that the Event will be observed when we perform an Experiment

Axioms of probability theory

  1. \(P(A) \geq 0\) for any event \(A\)
  2. \(P(\Omega) = 1\)
  3. If \(A\) and \(B\) are disjoint Events then \(P(A \cup B) = P(A) + P(B)\)

All that needs to be proven concerning probabilities can be proven using these axioms and axioms of set theory.

Example: prove that \(P(A) + P(\overline{A}) = 1\)

\[ \begin{align*} A \cup \overline{A} = \Omega \quad &\Rightarrow P(A \cup \overline {A}) = P(\Omega) = 1 \\ A \cap \overline{A} = \varnothing \quad &\Rightarrow P(A \cup \overline{A}) = P(A) + P(\overline{A}) \\ &\Rightarrow P(A) + P(\overline{A}) = 1 \end{align*} \]

Conditional probability

A conditional probability is calculated on a subset of the event space. For example on the subset \(B\).

Definition of the probability of Event \(A\) conditional on Event \(B\)

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

\(P(B)\) is called a normalization term: it re-scales \(P(A \cap B)\) to \([0,1]\)

Examples of conditional probability

Assume: codons have equal probability to be drawn; we make a random draw from the 64 codons

  1. What is the probability of drawing a codon that encodes Glu?
  2. What is the probability of drawing a codon that encodes Glu under the condition that it starts with a G?
  3. What is the probability that it starts with a C given that it encodes Leu?
  4. What is the probability that it encodes a branched-chain amino acid (Leu, Val, Ile) given that the second nucleotide is a U?

Product rule and law of total probability

The product rule is derived from the definition of conditional probability

\[ P(A \cap B) = P(A|B) P(B) \]

In case of many Events \(A_1\), \(A_2\), …, \(A_n\) that form a partition set of \(\Omega\), we have

\[ P(B) = \sum_{i=1}^n P(B \cap A_i) = \sum_{i=1}^n P(B|A_i)P(A_i) \]

Examples of product rule and total probability

We discard the assumption of equal probability of codons.

  1. The probability of drawing a codon starting with a C given that it encodes Arg equals \(\frac{2}{3}\). The probability of drawing a codon encoding Arg equals \(\frac{1}{20}\). What is the probability of drawing a codon that encodes Arg and starts with a C?
  2. The probability of drawing a codon starting with U and encoding an amino acid equals \(\frac{2}{30}\). The probability of drawing a codon starting with U and encoding a stop codon equals \(\frac{1}{60}\). What is the probability of drawing a codon starting with U?

Bayes rule

Expressing \(P(A|B)\) in terms of \(P(B|A)\).

Since \(P(A \cap B) = P(A|B) P(B) = P(B|A) P(A)\)

We have

\[ P(A|B) = \frac{P(B|A)P(A)}{P(B)} \]

When given conditional probabilities on a partition set \(\mathcal{F}=\{A_1,\ldots,A_n\}\):

\[ P(A_j|B) = \frac{P(B|A_j)P(A_j)}{\sum_{i=1}^n P(B|A_i)P(A_i)} \]

Example of applying Bayes rule

  • Of all emails that are spam \(1/5\)-th contains the word free
  • Of all emails that are not spam \(1/100\)-th contains the word free
  • Of all emails \(2/10\)-th is spam

Calculate the probability that an email is spam given that it contains the word free.

  • Make a notation for the events mentioned in the question: e.g. call the Events \(F\) (observe free in email), \(S\) (email is spam) and \(\overline{S}\) (email is not spam)
  • Identify the (conditional) probabilities in the description
  • Which (conditional) probability do we ask for?
  • Put these probabilities in Bayes rule

Independence of events

Two Events \(A\) and \(B\) are independent when

\[ P(A|B) = P(A), \] otherwise they are dependent

In words, if \(P(A|B)=P(A)\) then:

  • observing \(B\) does not change the probability of observing \(A\)
  • observing \(B\) is uninformative for observing \(A\)
  • Events \(A\) and \(B\) do not carry mutual information

This implies:

\[P(A \cap B) = P(A) P(B)\]

and

\[ P(B|A) = P(B) \]

Application of (in)dependence of events

Or of (lack of) mutual information

  1. In a card game you have noticed that very many spades were put on the table.
    1. Are the suits of subsequent cards dependent or independent of that observation?
    2. How does (in-)dependence of these events determine your strategy?
  1. Long-term field studies have shown that in an ecosystem with frequent droughts, drought in summer does not correlate with good conditions for plant growth in spring.
    1. How does independence of these events determine the germination strategy of seeds?
  1. The winner of a quiz gets the choice of three closed doors. Behind one door is a car, behind the other two are rabbits. After the winner has made a choice, the quiz master peeks behind the scenes and opens one of the other two doors, showing a rabbit. He asks the winner whether he wants to change doors.
    1. Is the choice of door opened by the quiz master independent or dependent of what is behind these doors?
    2. How should (in-)dependence of these events determine the winner’s strategy?
    3. What is the probability of winning the car with this strategy?

Probability distributions

Random variable

A random variable is a function that maps every outcome in a Sample space to a number (usually from \(\mathbb{N}\), \(\mathbb{Z}\) or \(\mathbb{R}\))

Reason: use these numbers as input for probability distribution functions to enable the calculation of probabilities

Example: toss of a coin,

\[ \begin{align*} \text{tail} & \rightarrow 0 \\ \text{head} & \rightarrow 1 \end{align*} \]

Example: TIGRFAM protein classes \[ \begin{align*} \text{TIGR00001} & \rightarrow 0 \\ \text{TIGR00002} & \rightarrow 1 \\ \vdots & \rightarrow \vdots \\ \text{TIGR04571} & \rightarrow 4570 \end{align*} \]

Example: glucose concentration

\[ g\,\text{mM} \rightarrow g \]

Definition

A probability distribution (function) is a function that assigns a probability to every Event in an Event space

Example, \(k\) pips when throwing a die:

\[ p(k) = \frac{1}{6} \qquad k \in \{1,\ldots,6\} \]

Example, probability of head in 1 toss of a coin (tail \(\rightarrow k=0\), head \(\rightarrow k=1\)):

\[ p(k) = \theta^{k}(1-\theta)^{1-k} \qquad k \in \{0,1\} \] Example, number of heads \(k\) in \(n\) tosses of a coin:

\[ p(k) = {n \choose k} \theta^{k}(1-\theta)^{n-k} \qquad k \in \{0,\ldots,n\} \]

Discrete and continuous sample spaces

  • Discrete sample spaces: probability mass functions (pmf)
  • Continuous sample spaces: probability density functions (pdf)


Example of a pdf:

The normal distribution \(f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}\)

To calculate a probability using a pdf you have to integrate over an interval:

\[ P(X \in [a,b]) \equiv P(a \leq X \leq b) = \int_a^b f(x) dx \]

Important: a probability density is not a probability!

Joint and marginal probabilities

Joint probabilities are probabilities of Events that are Cartesian set products

Example: Members of parliament are Tory or Labour and Male or Female

\[ \begin{align*} &\Omega_\text{parliament} = \Omega_\text{sex} \times \Omega_\text{party membership} = \\ &\{\text{F}, \text{M}\} \times \{\text{T}, \text{L}\} = \{(\text{F},\text{T}),(\text{F},\text{L}),(\text{M},\text{T}),(\text{M},\text{L})\} \end{align*} \]

Joint probability distribution:

sex Tory Labour
Female 0.156 0.182
Male 0.477 0.185


Example of an Event: \(P(\{F\} \cap \{T\}) \equiv P(\{(F,T)\}) = 0.156\)

In the context of joint probabilities, a marginal probability is a result of applying the law of total probability

Example: \(P(\{F\}) = P(\{F\} \cap \{T\}) + P(\{F\} \cap \{L\}) = 0.156 + 0.182 = 0.338\)

Example from biochemistry: activity of genes

Observed activities

Underlying probability distribution functions

Conditional probability distributions

\[ p(y\,|\,x) = \frac{p(x,y)}{p(x)} \]


Example: \(p(y\,|\,x=450)\)

Expectations

Expected value of a random variable

The expected value of a random variable \(X\), written as \(\mathbb{E}[X]\) equals

\[ \mathbb{E}[X] = \sum_i x P(X = x) \]

  • It equals the population average of a random variable.
  • It is a property of the distribution function \(P(X)\) of \(X\).

Important: Differs from sample mean \(\frac{1}{n} \sum_{i=1}^n {X_i}\) of a set of samples \(X_i\) of \(X\).

Some important formulas

Expected value of a function of \(X\)

The expected value of a function \(g(X)\) of \(X\), equals

\[ \mathbb{E}[g(X)] = \sum_i g(x) P(X = x) \]

Expected value of a constant

\[ \mathbb{E}[a] = a \] in particular,

\[ \mathbb{E}[\mathbb{E}[X]] = \mathbb{E}[X] \]

Linearity of the expectation operator

\[ \begin{align*} \mathbb{E}[aX] &= a \mathbb{E}[X] \\ \mathbb{E}[X + Y] &= \mathbb{E}[X] + \mathbb{E}[Y] \end{align*} \]

Population variance

The population variance of a random variable \(X\) is defined as

\[ \text{var}(X) = \mathbb{E} \left[ (X - \mathbb{E}[X])^2 \right] \]

Alternative (equivalent) definition:

\[ \text{var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2 \]

Independent, identically distributed samples

A set of independent samples \(X_i\) drawn from the distribution of a random variable \(X\) is called a set of Independent, Identically Distributed samples (i.i.d.) samples.

  • Independent: the value of \(X_j\) is independent of the the values of \(X_i\) where \(i\neq j\).
  • Identically distributed: all \(X_i\) are random variables with the same properties as \(X\). In particular

\[ \forall{i}: \quad \mathbb{E}[X_i] = \mathbb{E}[X] \]

Unbiased estimators

For an unbiased estimator \(\hat{Z}\) of a random variable \(Z\) the following must hold:

\[ \mathbb{E}[\hat{Z}] = \mathbb{E}[Z] \]

Suppose \(X_i\) is a set of i.i.d. samples of \(X\)

The sample mean \(\overline{X} = \frac{1}{n}\sum_{i=1}^n{X_i}\) is an unbiased estimator of \(\mathbb{E}[X]\), i.e.

\[ \mathbb{E} [ \overline{X} ] = \mathbb{E}[\mathbb{E}[X]] = \mathbb{E}[X] \]

Proof: follows immediately from linearity of the Expectation operator.

The sample variance \(s^2(X_i) = \frac{1}{n-1}\sum\left( X_i - \overline{X}\right)^2\) is an unbiased estimator of \(\text{var}(X)\) , i.e.

\[ \mathbb{E}\left[ \frac{1}{n-1} \sum_{i=1}^n \left( X_i - \overline{X} \right)^2 \right] = \mathbb{E}[\text{var}(X)] = \mathbb{E} \left[ (X - \mathbb{E}[X])^2 \right] \]

Proof: see exercise in syllabus.

Appendix

Answers

Answers to Conditional probability

  1. \(2/64 = 1/32\)
  2. \(\frac{2/64}{1/4} = 8/64 = 1/8\)
  3. \(\frac{4/64}{6/64} = 4/6 = 2/3\)
  4. \(\frac{13/64}{1/4} = 13/16\)

Answers to Product rule and total probability

  1. \[ P(Arg \cap C_{start}) = P(Arg|C_{start}) \cdot P(C_{start}) = \frac{2}{3} \frac{1}{20} = \frac{1}{30} \]
  2. \(\{\{\text{aa-codon}\}, \{\text{stop-codon}\}\}\) is a partition set of the set of all possible outcomes of encodings by a codon (note that a start codon also encodes an amino acid) \[ P(U_{start}) = p(U_{start} \cap \text{aa-codon}) + p(U_{start} \cap \text{stop-codon}) = \frac{2}{30} + \frac{1}{60} = \frac{1}{12} \]

Answers to Bayes rule

\[ P(S|F) = \frac{P(F|S) P(S)}{P(F|S)P(S) + P(F|\overline{S})P(\overline{S})} = \frac{\frac{1}{5}\frac{2}{10}}{\frac{1}{5}\frac{2}{10}+\frac{1}{100}\frac{9}{10}} = \frac{40}{49} \approx \frac{4}{5} \]

Literature

Messerli, Franz H. 2012. “Chocolate Consumption, Cognitive Function, and Nobel Laureates.” New England Journal of Medicine 367 (16): 1562–64. https://doi.org/10.1056/nejmon1211064.